Social Data Analysis and Visualization - Assignment 2¶

Introduction¶

Vandalism is damage to public or private property. It has been a serious problem in San Francisco for a long time. For example, graffiti on walls in the Mission District and broken windows in SOMA cost the city large sums of money each year. They also reduce the sense of safety in local neighborhoods. Police data from 2003 to 2024 show that vandalism in San Francisco may have changed with local policy shifts and community efforts.

In 2006, then-Mayor Gavin Newsom started the “Graffiti Watch” program[1]. It asked residents to report graffiti incidents and restore damaged public spaces. In 2015, reports of serious vandalism such as broken car windows went up. Then, law enforcement targeted repeat offenders[2][3]. But these actions were not the same across all areas. Some grassroots groups and neighborhood coalitions also formed. They tried to involve young people and create mural projects that turned graffiti into approved street art.

In 2018, the city introduced stricter rules. They required property owners to remove graffiti within 48 hours or pay fines[4]. Some people say this led to a quick rise in recorded vandalism because more cases were reported. Others believe large events and more tech workers influenced these changes. During the COVID-19 pandemic in 2021–2022, certain kinds of vandalism, went up in some districts[5].

These social and policy factors guide our data exploration. For this project, we use crime reports from 2003 to 2024 to answer key questions:

  • When does vandalism happen most often (by hour, day, or season)?
  • Where are the main hotspots, and how have they shifted over time?
  • Did policy changes or community efforts affect vandalism rates?

Our analysis has three main visual displays:

  • A time-series chart to show hourly and weekday trends,
  • A heatmap to show month-to-month changes,
  • An interactive Bokeh chart that lets readers explore district-level patterns by year.

Necessary Imports and reading the merged csv of the SF crime data¶

In [84]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import string
import re
import calplot
import logging
import folium
from folium.plugins import HeatMapWithTime
import plotly.express as px
from scipy.stats import probplot
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, Select, CustomJS, CheckboxGroup, CDSView, GroupFilter, Legend, LegendItem
from bokeh.layouts import column
from bokeh.palettes import Category20, Turbo256
from bokeh.layouts import row
import calendar
plt.rcParams["font.family"] = "Georgia" #Setting the font to Georgia for all plots
In [12]:
data=pd.read_csv('merged_data.csv') #reading the csv
print("Shape of DataFrame:", data.shape)  
data.head()
Shape of DataFrame: (1729370, 11)
Out[12]:
Incident Code Category Latitude Longitude TimeOfDay DayOfWeek DayOfMonth Month Year PdDistrict Resolution
0 3074 ROBBERY 37.708311 -122.420084 17 Monday 22 November 2004 INGLESIDE NONE
1 7021 VEHICLE THEFT 90.000000 -120.500000 20 Tuesday 18 October 2005 PARK NONE
2 7021 VEHICLE THEFT 90.000000 -120.500000 2 Sunday 15 February 2004 SOUTHERN NONE
3 4134 ASSAULT 37.770913 -122.410541 17 Sunday 21 November 2010 SOUTHERN NONE
4 4134 ASSAULT 37.745158 -122.470366 15 Tuesday 2 April 2013 TARAVAL NONE

A Short Data Story: Trends in Vandalism over time across the SF Police Districts¶

Part 1: Time Series Plot to show hourly occurences of Vandalism for each day of the week¶

In [15]:
# defining the order of days of the week
days_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

plt.figure(figsize=(12, 6))

# Looping through each day of the week and counting the hourly crime occurrences
for day in days_order:
    hourly_counts = data[(data['Category'] == 'VANDALISM') & (data['DayOfWeek'] == day)]['TimeOfDay'].value_counts().sort_index()
    
    hourly_counts = hourly_counts.reindex(range(24), fill_value=0)
 
    plt.plot(hourly_counts.index, hourly_counts.values, label=day, marker='o')

plt.xticks(ticks=range(24), labels=[f"{i:02d}" for i in range(24)])

plt.title('Prostitution Crime Occurrences by Hour of the Day for each Day of the Week')
plt.xlabel('Hour of the Day')
plt.ylabel('Number of Occurrences')
plt.legend(title="Day of the Week", bbox_to_anchor=(1, 1), loc='upper left')
plt.grid(True, linestyle='--', alpha=0.6)

plt.show()
No description has been provided for this image

Figure 1: The above time series chart shows the number of Vandalism crime occurences in SF at each hour of the day for all days of the week. It can be observed that during early morning (4am-6am), the numbers are the lowest and it keeps rising and the daytime peak is at 12 pm for all the days. There is a little dip after that and then it again keeps increasing to reach the highest peak at 6pm. Throughout the evenings, the numbers remain high and stop dropping steadily after midnight. Friday evenings, Saturdays and Sundays have generally higher occurences than workdays. This observation is consistent with research[6] and this might be linked to more public gatherings or nightlife on weekends.

Heatmap movie showing changes in Vandalism occurences in SF per month over the years¶

In [18]:
# Filtering for VANDALISM data
vandalism_data = data[data["Category"].str.upper() == "VANDALISM"].copy()

# Month mapping
month_mapping = {month: idx for idx, month in enumerate([
    "January", "February", "March", "April", "May", "June", 
    "July", "August", "September", "October", "November", "December"], start=1)}

vandalism_data["Month_Num"] = vandalism_data["Month"].map(month_mapping)

# Sorting
vandalism_data = vandalism_data.sort_values(["Year", "Month_Num"])

# Creating time series data
time_series_data = []
time_labels = []

# Grouping data by Year and Month
for (year, month), group in vandalism_data.groupby(["Year", "Month_Num"]):
    heat_data = group[["Latitude", "Longitude"]].dropna().values.tolist()
    time_series_data.append(heat_data)
    month_name = list(month_mapping.keys())[list(month_mapping.values()).index(month)]
    time_labels.append(f"{year} - {month_name}")

# Creating SF Folium map
sf_map = folium.Map(location=[37.7749, -122.4194], zoom_start=12)

# Adding HeatMapWithTime
HeatMapWithTime(
    time_series_data,
    index=time_labels,
    auto_play=True,
    max_opacity=0.8,
    radius=10
).add_to(sf_map)

# Plot title
title_html = """
<h3 align="center" style="font-size:16px"><b>Vandalism Crimes' Heatmap Over Time across SF Districts</b></h3>
"""
sf_map.get_root().html.add_child(folium.Element(title_html))

# Displaying the map
sf_map
Out[18]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Figure 2: The above movie shows Vandalism occurences in SF for each month across all the years (2003-2024). Northern, IngleSide and Mission appear to be the hotspots over the years, recording high numbers throughout. Taraval, Richmond and Park have had relatively low number of occuurences. After 2017, activities have increased in districts like Mission, Tenderloin and Central. The overall number of cases across the different districts seem to have increases in recent years. This observation is consistent with research[7]. The map suggests that hotspots can shift over time, perhaps because of gentrification, better security, or community programs.

Interactive Visualization for monthly occurences of Vandalism across SF districts over the years¶

In [91]:
output_notebook()

# Preparing dataset
vandalism_df = data[data["Category"].str.upper() == "VANDALISM"].copy()

# Month mapping
if vandalism_df["Month"].dtype == "object":
    vandalism_df["Month"] = vandalism_df["Month"].apply(
        lambda x: list(calendar.month_name).index(x) if x in calendar.month_name else x
    )

# Grouping and normalizing the data
monthly_grouped = vandalism_df.groupby(["PdDistrict", "Year", "Month"]).size().reset_index(name="count")
monthly_grouped["total"] = monthly_grouped.groupby(["PdDistrict", "Month"])["count"].transform("sum")
monthly_grouped["rel_freq"] = monthly_grouped["count"] / monthly_grouped["total"]

# Creating pivot table for plotting
pivot = monthly_grouped.pivot_table(index=["PdDistrict", "Month"], columns="Year", values="rel_freq", fill_value=0)
pivot.reset_index(inplace=True)

# Generating month labels
month_labels = [calendar.month_abbr[m] for m in range(1, 13)]
pivot["MonthLabel"] = pivot["Month"].map(dict(zip(range(1, 13), month_labels)))

# Formatting columns
pivot.columns = [str(c) if isinstance(c, int) else c for c in pivot.columns]
years = sorted([col for col in pivot.columns if col.isdigit()])

# Defining colors palettes
districts = pivot["PdDistrict"].unique()
bar_colors = Category20[20] if len(years) <= 20 else Category20[20] * (len(years) // 20 + 1)
line_colors = Turbo256[::len(Turbo256)//len(years)][:len(years)]

# Creating ColumnDataSources per district
district_sources = {
    district: ColumnDataSource(pivot[pivot["PdDistrict"] == district].copy()) for district in districts
}

# Creating the figure
p = figure(x_range=month_labels, height=500, width=1000,
           title="Monthly Occurrences of Vandalism across SF Districts (2003–2024)", y_range=(0, 0.1), toolbar_location=None)
p.y_range.start = 0
p.xaxis.axis_label = "Month"
p.yaxis.axis_label = "Relative Frequency"
p.title.text_font_size = "16pt"

# Plotting bars
renderers_district = []
legend_items_district = []

for i, district in enumerate(districts):
    source = district_sources[district]
    r = p.vbar(x="MonthLabel", top=years[0], width=0.5,  # Bar width updated here
               source=source, fill_color=bar_colors[i % len(bar_colors)],
               fill_alpha=0.5, line_color=None)
    r.visible = (i == 0)
    renderers_district.append(r)
    legend_items_district.append(LegendItem(label=district, renderers=[r]))

# Plotting lines for years
line_renderers = []
legend_items_year = []
example_source = district_sources[districts[0]]

for i, year in enumerate(years):
    r = p.line(x="MonthLabel", y=year, source=example_source,
               line_width=2, color=line_colors[i % len(line_colors)], alpha=0.9)
    r.visible = False
    line_renderers.append(r)
    legend_items_year.append(LegendItem(label=str(year), renderers=[r]))

# Adding legends
legend_district = Legend(items=legend_items_district, location="center", title="District")
legend_year = Legend(items=legend_items_year, location="center", title="Year", label_text_font_size="8pt")

p.add_layout(legend_district, 'right')
p.add_layout(legend_year, 'right')
p.legend.click_policy = "hide"

# Displaying plot
show(p)
Loading BokehJS ...

Figure 3: The above interactive vizualization shows the normalized Vandalism occurences for each month of the year. The Y-axis represents the relative frequency of occurences. The bars show monthly variations in Vandalism for each SF district. The line plots for each year show monthly trends for that year, irrespective of districts.

It can be observed that districts like Ingleside, Mission, Northern have recorded higher occurences than other SF districts. Southern, Bayview, Central have had comparitively low counts. The line plots show that for some years summer months have recorded higher counts than winters. 2009 and 2011 have had quite high counts in December. 2017 had higher counts in July-August than most other years.This observation is consistent with research[8].

Analysing Vandalism Trends over time across SF Districts¶

We have studied temporal patterns in SF's Vandalism crimes through a time-series plot showing hourly occurences across each day of the week (Figure 1), a heatmap movie (Figure 2) to depict the trends in incidents for each month over the years and an interactive plot to show monthy occurences in each SF district over the years (Figure 3).

From analysing these plots we have observed that districts like Mission, Northern and Ingleside are Vandalism hotspots whereas, Southern, Bay, Central have had consistently lower occurences. In recent years, cases have increased in Mission, Tenderloin and Central districts, making them relatively safer in terms of Vandalism crimes. Increased activities from Friday afternoon, till Sunday night is expected and could be accounted for by increase in night-life, alcohol and substance use, and lower surveillance since business and offices are closed. Intoxication is a very likely reason for younger groups of people hanging out during weekends to vandalise property, either due to personal spite or purely for fun. During daytime, visibility is high and group activities like parties, gatherings, substance consumption etc is low, coupled with higher police patrolling leading to reduced activities during office hours.

Conclusion¶

The analysis shows that vandalism in San Francisco depends on both time and place. We see higher incidents on weekends and during evening hours. We also find hotspots that stay active for many years. Some neighborhoods report high activity, but others show improvement over time.

These patterns can guide policymakers, law enforcement, and local communities on where and when to act. They can add more surveillance, hold public awareness events, or improve street lighting. This data makes it easier to build safer neighborhoods and meet community needs.

References¶

[1] San Francisco Public Works. Annual Report 2007–2008[EB/OL]. [2008]. Available: https://sfpublicworks.org/sites/default/files/90_AnnualRpt_0209.pdf [2] "S.F. car break-ins up 31 percent, nearly triple in 5 years"[EB/OL]. SFGATE, 2016-03-15. Available: https://www.sfgate.com/crime/article/S-F-car-break-ins-up-31-percent-nearly-triple-6894503.php?utm_source=chatgpt.com

[3] "Why Can't San Francisco Stop Its Epidemic of Window Smashing?"[EB/OL]. The Atlantic, 2016-04-12. Available: https://www.theatlantic.com/politics/archive/2016/04/san-francisco-crime-policy/479880/?utm_source=chatgpt.com

[4] San Francisco Public Works Code, Article 23: Graffiti Removal and Abatement[EB/OL]. Available: https://codelibrary.amlegal.com/codes/san_francisco/latest/sf_publicworks/0-0-0-4912?utm_source=chatgpt.com

[5] "Vandals Adding To Struggling San Francisco Chinatown Businesses COVID Recovery Woes"[EB/OL]. CBS San Francisco, 2021-06-14. Available: https://www.cbsnews.com/sanfrancisco/news/san-francisco-chinatown-vandals-businesses-covid-recovery-woes/?utm_source=chatgpt.com

[6] San Francisco Police Department. (n.d.). Crime Dashboard. https://www.sanfranciscopolice.org/stay-safe/crime-data/crime-dashboard

[7] San Francisco Government. Street and Sidewalk Standards Annual Report Fiscal Year 2024[EB/OL]. [2024-12-04]. Available: https://www.sf.gov/reports--december-2024--street-and-sidewalk-standards-annual-report-fiscal-year-2024.

[8] San Francisco Chronicle. (2024, January 5). San Francisco violent, property crime fell to 20-year low in 2024. https://www.sfchronicle.com/crime/article/san-francisco-2024-data-20020378.php

Contibutions¶

Srijita Sarkar (s242527) - SS¶
Zhanhui Qu (s242603) - ZQ¶
Aikaterini Laskaraki (s242809) - KL¶
SS ZQ KL
Plots and Figure Captions 80% 10% 10%
Creating the Website 10% 10% 80%
Introduction and Conclusion 10% 80% 10%
In [ ]: